A two-level classifier for discriminating similar languages

نویسندگان

  • Judit Ács
  • László Grad-Gyenge
  • Thiago Bruno
چکیده

The BRUniBP team’s submission is presented for the Discriminating between Similar Languages Shared Task 2015. Our method is a two phase classifier that utilizes both character and word-level features. The evaluation shows 100% accuracy on language group identification and 93.66% accuracy on language identification. The main contribution of the paper is a memory-efficient correlation based feature selection method.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Classification of EEG Signals for Discrimination of Two Imagined Words

In this study, a Brain-Computer Interface (BCI) in Silent-Talk application was implemented. The goal was an electroencephalograph (EEG) classifier for three different classes including two imagined words (Man and Red) and the silence. During the experiment, subjects were requested to silently repeat one of the two words or do nothing in a pre-selected random order. EEG signals were recorded by ...

متن کامل

The NRC System for Discriminating Similar Languages

We describe the system built by the National Research Council Canada for the ”Discriminating between similar languages” (DSL) shared task. Our system uses various statistical classifiers and makes predictions based on a two-stage process: we first predict the language group, then discriminate between languages or variants within the group. Language groups are predicted using a generative classi...

متن کامل

Comparing Two Basic Methods for Discriminating Between Similar Languages and Varieties

This article describes the systems submitted by the Citius Ixa Imaxin team to the Discriminating Similar Languages Shared Task 2016. The systems are based on two different strategies: classification with ranked dictionaries and Naive Bayes classifiers. The results of the evaluation show that ranking dictionaries are more sound and stable across different domains while basic bayesian models perf...

متن کامل

Using Maximum Entropy Models to Discriminate between Similar Languages and Varieties

DSLRAE is a hierarchical classifier for similar written languages and varieties based on maximum-entropy (maxent) classifiers. In the first level, the text is classified into a language group using a simple token-based maxent classifier. At the second level, a group-specific maxent classifier is applied to classify the text as one of the languages or varieties within the previously identified g...

متن کامل

A Simple Baseline for Discriminating Similar Languages

This paper describes an approach to discriminating similar languages using wordand characterbased features, submitted as the Queen Mary University of London entry to the Discriminating Similar Languages shared task. Our motivation was to investigate how well a simple, datadriven, linguistically naive method could perform, in order to provide a baseline by which more linguistically complex or kn...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015